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Abstract. As proofs in type theory become increasingly complex, there 
is a growing need to provide better proof automation. This paper shows 
how to implement a Prolog-style resolution procedure in the dependently 
typed programming language Agda. Connecting this resolution proce- 
dure to Agda’s reflection mechanism provides a first-class proof search 
tactic for first-order Agda terms. As a result, writing proof automation 
tactics need not be different from writing any other program. 


1 Introduction 

Writing proof terms in type theory is hard and often tedious. Interactive proof 
assistants based on type theory, such as Agda [Norell, 2007] or Coq [2004], take 
very different approaches to facilitating this process. 

The Coq proof assistant has two distinct language fragments. Besides the 
programming language Gallina, there is a separate tactic language for writing 
and programming proof scripts. Together with several highly customizable tac- 
tics, the tactic language Ltac can provide powerful proof automation [Chlipala, 
2013]. Having to introduce a separate tactic language, however, seems at odds 
with the spirit of type theory, where a single language is used for both proof 
and computation. Having a separate language for programming proofs has its 
drawbacks: programmers need to learn another language to automate proofs, 
debugging Ltac programs can be difficult, and the resulting proof automation 
may be inefficient [Braibant, 2012]. 

Agda does not have Coq’s segregation of proof and programming language. 
Instead, programmers are encouraged to automate proofs by writing their own 
solvers [Norell, 2009b], In combination with Agda’s reflection mechanism [Agda 
development team, 2013, van der Walt and Swierstra, 2013], developers can 
write powerful automatic decision procedures [Allais, 2010]. Unfortunately, not 
all proofs are easily automated in this fashion. If this is the case, the user is 
forced to interact with the integrated development environment and manually 
construct a proof term step by step. 

This paper tries to combine the best of both worlds by implementing a library 
for proof search within Agda itself. In other words, we have defined a program 
for the automatic construction of mathematical proofs. More specifically, this 
paper makes several novel contributions: 



— We show how to implement a Prolog interpreter in the style of Stutterheim 
et al. [2013] in Agda (Section 3). Note that, in contrast to Agda, resolving a 
Prolog query need not terminate. Using coinduction, however, we can write 
an interpreter for Prolog that is total. 

— Resolving a Prolog query results in a substitution that, when applied to the 
goal, produces a solution in the form of a term that can be derived from the 
given rules. We extend our interpreter to also produce a trace of the applied 
rules, which enables it to produce a proof term that shows the resulting 
substitution is valid. 

— We integrate this proof search algorithm with Agda’s reflection mechanism 
(Section 4). This enables us to quote the type of a lemma we would like to 
prove, pass this term as the goal of our proof search algorithm, and finally, 
unquote the resulting proof term, thereby proving the desired lemma. 

Although Agda already has built-in proof search functionality [Lindblad and 
Benke, 2004], our approach has several key advantages over most existing ap- 
proaches to proof automation: 


— Our library is highly customizable. We may parametrize our tactics over the 
search depth, hint database, or search strategy. Each of these is itself a first- 
class Agda value, that may be inspected or transformed, depending on the 
user’s needs. 

— Although we limit ourself in the paper to a simple depth-first search, different 
proofs may require a different search strategy. Such changes are easily made 
in our library. To illustrate this point, we will develop a variation of our 
tactic which allows the user to limit the number of times certain rules may 
be applied (Section 5). 

— Users need not learn a new programming language to modify existing tactics 
or develop tactics of their own. They can use a full-blown programming 
language to define their tactics, rather than restrict themselves to a domain- 
specific tactic language such as Ltac. 

— Finally, users can use all the existing Agda technology for testing and debug- 
ging programs when debugging the generation of proofs. Debugging complex 
tactics in Coq requires a great deal of expertise - we hope that implementing 
tactics as a library will make this process easier. 

We will compare our library with the various alternative forms of proof automa- 
tion in greater depth in Section 6, after we have presented our development. 

All the code described in this paper is freely available from GitHub. 1 It is 
important to emphasize that all our code is written in the safe fragment of Agda: 
it does not depend on any postulates or foreign functions; all definitions pass 
Agda’s termination checker; and all metavariables are resolved. 


1 See https://github.com/pepijnkokke/AutoInAgda. 



2 Motivation 


Before describing the implementation of our library, we will provide a brief intro- 
duction to Agda’s reflection mechanism and illustrate how the proof automation 
described in this paper may be used. 


Reflection in Agda 

Agda has a reflection mechanism 2 for compile time metaprogramming in the 
style of Lisp [Pitman, 1980], MetaML [Taha and Sheard, 1997], and Template 
Haskell [Sheard and Peyton Jones, 2002]. This reflection mechanism makes it 
possible to convert a program fragment into its corresponding abstract syntax 
tree and vice versa. We will introduce Agda’s reflection mechanism here with 
several short examples, based on the explanation in previous work [van der Walt 
and Swierstra, 2013]. A more complete overview can be found in the Agda release 
notes [Agda development team, 2013] and Van der Walt’s thesis [2012], 

The type Term : Set is the central type provided by the reflection mechanism. 
It defines an abstract syntax tree for Agda terms. There are several language 
constructs for quoting and unquoting program fragments. The simplest example 
of the reflection mechanism is the quotation of a single term. In the definition 
of idTerm below, we quote the identity function on Boolean values. 

idTerm : Term 

idTerm = quoteTerm (A (x : Bool) — > x) 

When evaluated, the idTerm yields the following value: 

lam visible (var 0 []) 

On the outermost level, the lam constructor produces a lambda abstraction. It 
has a single argument that is passed explicitly (as opposed to Agda’s implicit 
arguments). The body of the lambda consists of the variable identified by the 
De Bruijn index 0, applied to an empty list of arguments. 

The quote language construct allows users to access the internal represen- 
tation of an identifier, a value of a built-in type Name. Users can subsequently 
request the type or definition of such names. 

Dual to quotation, the unquote mechanism allows users to splice in a Term, 
replacing it with its concrete syntax. For example, we could give a convoluted 
definition of the K combinator as follows: 

const : V{AB}->A->B->A 

const = unquote (lam visible (lam visible (var 1 []))) 

2 Note that Agda’s reflection mechanism should not be confused with ‘p ro °f by re- 
flection’ - the technique of writing a verified decision procedure for some class of 
problems. 



The language construct unquote is followed by a value of type Term. In this 
example, we manually construct a Term representing the K combinator and splice 
it in the definition of const. The unquote construct then type-checks the given 
term, and turns it into the definition A x — > A y — > x. 

The final piece of the reflection mechanism that we will use is the quoteGoal 
construct. The usage of quoteGoal is best illustrated with an example: 

goallnHole : N 

goallnHole = quoteGoal g in { }0 

In this example, the construct quoteGoal g binds the Term representing the type 
of the current goal, N, to the variable g. When completing this definition by 
filling in the hole labeled 0, we may now refer to the variable g. This variable is 
bound to def N [], the Term representing the type N. 


Using proof automation 

To illustrate the usage of our proof automation, we begin by defining a predicate 
Even on natural numbers as follows: 

data Even : N — ► Set where 
isEvenO : Even 0 

isEven+2 : V { n } — »• Even n — ► Even (sue (sue n)) 

Next we may want to prove properties of this definition: 

even+ : Even n — ► Even m — ► Even (n + m) 

even+ isEvenO e2 = e2 

even+ (isEven+2 el) e2 = isEven+2 (even+ el e2) 

Note that we omit universally quantified implicit arguments from the typeset ver- 
sion of this paper, in accordance with convention used by Haskell [Peyton Jones, 
2003] and Idris [Brady, 2013]. 

As shown by Van der Walt and Swierstra [2013], it is easy to decide the 
Even property for closed terms using proof by reflection. The interesting terms, 
however, are seldom closed. For instance, if we would like to use the even-|- lemma 
in the proof below, we need to call it explicitly. 

trivial : Even n — > Even (n + 2) 
trivial e = even-|- e (isEven+2 isEvenO) 

Manually constructing explicit proof objects in this fashion is not easy. The 
proof is brittle. We cannot easily reuse it to prove similar statements such 
as Even (n + 4). If we need to reformulate our statement slightly, proving 
Even (2 + n) instead, we need to rewrite our proof. Proof automation can make 
propositions more robust against such changes. 

Coq’s proof search tactics, such as auto, can be customized with a hint 
database , a collection of related lemmas. In our example, auto would be able 



to prove the trivial lemma, provided the hint database contains at least the 
constructors of the Even data type and the even+ lemma. In contrast to the 
construction of explicit proof terms, changes to the theorem statement need 
not break the proof. This paper shows how to implement a similar tactic as an 
ordinary function in Agda. 

Before we can use our auto function, we need to construct a hint database: 

hints : HintDB 

hints = e « quote isEvenO « quote isEven+2 « quote even+ 

To construct such a database, we use quote to obtain the names of any terms 
that we wish to include in it and pass them to the right-hand side of the _«_ 
function, which will insert them into a hint database to the left. Note that 6 
represents the empty hint database. We will describe the implementation of 
_«_ in more detail in Section 4. For now it should suffice to say that, in the 
case of even+, after the quote construct obtains an Agda Name, _«_ uses the 
built-in function type to look up the type associated with even-)-, and generates a 
derivation rule which states that given two proofs of Even n and Even m, applying 
the rule even-|- will result in a proof of Even (n + m). 

Note, however, that unlike Coq, the hint data base is a first-class value that 
can be manipulated, inspected, or passed as an argument to a function. 

We now give an alternative proof of the trivial lemma using the auto tactic 
and the hint database defined above: 

trivial : Even n — > Even (n + 2) 

trivial = quoteGoal g in unquote (auto 5 hints g) 

Or, using the newly added Agda tactic syntax 3 : 

trivial : Even n — »• Even (n + 2) 

trivial = tactic (auto 5 hints) 

The notation tactic f is simply syntactic sugar for quoteGoal g in unquote (f g), 
for some function f. 

The central ingredient is a function auto with the following type: 

auto : (depth : N) — » HintDB — ► Term — > Term 

Given a maximum depth, hint database, and goal, it searches for a proof Term 
that witnesses our goal. If this term can be found, it is spliced back into our 
program using the unquote statement. 

Of course, such invocations of the auto function may fail. What happens if 
no proof exists? For example, trying to prove Even n — ► Even (n + 3) in this 
style gives the following error: 

Exception searchSpaceExhausted ! =< 

Even .n -> Even (.n + 3) of type Set 

Syntax for Agda tactics was added in Agda 2.4.2. 


3 



When no proof can be found, the auto function generates a dummy term with a 
type that explains the reason the search has failed. In this example, the search 
space has been exhausted. Unquoting this term, then gives the type error message 
above. It is up to the programmer to fix this, either by providing a manual proof 
or diagnosing why no proof could be found. 

Overview The remainder of this paper describes how the auto function is imple- 
mented. Before delving into the details of its implementation, however, we will 
give a high-level overview of the steps involved: 

1. The tactic keyword converts the goal type to an abstract syntax tree, i.e., 
a value of type Term. In what follows we will use AgTerm to denote such 
terms, to avoid confusion with the other term data type that we use. 

2. Next, we check the goal term. If it has a functional type, we add the argu- 
ments of this function to our hint database, implicitly introducing additional 
lambdas to the proof term we intend to construct. At this point we check that 
the remaining type and all its original arguments are are first-order. If this 
check fails, we produce an error message, not unlike the searchSpaceExhausted 
term we saw above. We require terms to be first-order to ensure that the uni- 
fication algorithm, used in later steps for proof search, is decidable. If the 
goal term is first-order, we convert it to our own term data type for proof 
search, PsTerm. 

3. The key proof search algorithm, presented in the next section, then tries 
to apply the hints from the hint database to prove the goal. This process 
coinductively generates a (potentially infinite) search tree. A simple bounded 
depth-first search through this tree tries to find a series of hints that can be 
used to prove the goal. 

4. If such a proof is found, this is converted back to an AgTerm; otherwise, 
we produce an erroneous term describing that the search space has been 
exhausted. Finally, the unquote keyword type checks the generated AgTerm 
and splices it back into our development. 

The rest of this paper will explain these steps in greater detail. 

3 Proof search in Agda 

The following section describes our implementation of proof search a la Prolog 
in Agda. This implementation abstracts over two data types for names — one for 
inference rules and one for term constructors. These data types will be referred 
to as RuleName and TermName, and will be instantiated with concrete types 
(with the same names) in section 4. 


Terms and unification 

The heart of our proof search implementation is the structurally recursive unifi- 
cation algorithm described by McBride [2003] . Here the type of terms is indexed 



by the number of variables a given term may contain. Doing so enables the for- 
mulation of the unification algorithm by structural induction on the number of 
free variables. For this to work, we will use the following definition of terms 

data PsTerm (n : N) : Set where 
var : Fin n — ► PsTerm n 

con : TermName — > List (PsTerm n) — »• PsTerm n 

We will use the name PsTerm to stand for proof search term to differentiate 
them from the terms from Agda’s reflection mechanism, AgTerm. In addition to 
variables, represented by the finite type Fin n, we will allow first-order constants 
encoded as a name with a list of arguments. 

For instance, if we choose to instantiate TermName with the following Arith 
data type, we can encode numbers and simple arithmetic expressions: 

data Arith : Set where 
Sue : Arith 
Zero : Arith 
Add : Arith 

The closed term corresponding to the number one could be written as follows: 

One : PsTerm 0 

One = con Sue (con Zero [] :: []) 

Similarly, we can use the var constructor to represent open terms, such as x + 1. 
We use the prefix operator ff to convert from natural numbers to finite types: 

AddOne : PsTerm 1 

AddOne = con Add (var (ff 0) :: One :: []) 

Note that this representation of terms is untyped. There is no check that en- 
forces addition is provided precisely two arguments. Although we could add 
further type information to this effect, this introduces additional overhead with- 
out adding safety to the proof automation presented in this paper. For the sake 
of simplicity, we have therefore chosen to work with this untyped definition. 

We shall refrain from further discussion of the unification algorithm itself. 
Instead, we restrict ourselves to presenting the interface that we will use: 

unify : (ti X .2 : PsTerm m) — » Maybe (3 [n] Subst m n) 

The unify function takes two terms ti and t 2 and tries to compute a substitution — 
the most general unifier. Substitutions are indexed by two natural numbers m 
and n. A substitution of type Subst m n can be applied to a PsTerm m to produce 
a value of type PsTerm n. As unification may fail, the result is wrapped in the 
Maybe type. In addition, since the number of variables in the terms resulting 
from the unifying substitution is not known a priori, this number is existentially 
quantified over. For the remainder of the paper, we will write 3 [x] B to mean a 
type B with occurrences of an existentially quantified variable x, or 3 (A x — > B ) 
in full. 



Inference rules 


The hints in the hint database will form inference rules that we may use to prove 
a goal term. We represent such rules as records containing a rule name, a list of 
terms for its premises, and a term for its conclusion: 

record Rule (n : N) : Set where 
field 

name : RuleName 
premises : List (PsTerm n) 
conclusion : PsTerm n 
arity : N 

arity = length premises 

Once again the data-type is quantified over the number of variables used in the 
rule. Note that the number of variables in the premises and the conclusion is the 
same. 

Using our newly defined Rule type we can give a simple definition of addition. 
In Prolog, this would be written as follows. 

add(0, X, X). 

add(suc (X) , Y, suc(Z)) add(X, Y, Z) . 

Unfortunately, the named equivalents in our Agda implementation given in Fig- 
ure 1 are a bit more verbose. Note that we have, for the sake of this example, 
instantiated the RuleName and TermName to String and Arith respectively. 

A hint database is nothing more than a list of rules. As the individual rules 
may have different numbers of variables, we existentially quantify these: 

HintDB : Set 

HintDB = List (3 [n] Rule n) 


Generalised injection and raising 

Before we can implement some form of proof search, we need to define a pair 
of auxiliary functions. During proof resolution, we will work with terms and 
rules containing a different number of variables. We will use the following pair 
of functions, inject and raise, to weaken bound variables, that is, map values of 
type Fin n to some larger finite type. 

inject : V{m}n— >-Finm— s>Fin(m + n) 

inject n zero = zero 

inject n (sue i) = sue (inject n i) 

raise : V m { n } — » Fin n — ► Fin (m + n) 

raise zero i = i 

raise (sue m) i = sue (raise m i) 



AddBase : Rule 1 
Ad dBase = record { 

name = "AddBase" 

conclusion = con Add ( con Zero [] 

:: var(# 0) 

:: var(# 0) 

::[]) 

premises = [ 

} 

AddStep : Rule 3 
AddStep = record { 

name = "AddStep" 

conclusion = con Add ( con Sue (var (# 0) :: []) 

:: var (# 1) 

:: con Sue (var (# 2) :: []) 
::[]) 

premises = con Add ( var (# 0) 

::var(# 1) 

::var(# 2) 

::[]) 


} 


Fig. 1 . Agda representation of example rules 


On the surface, the inject function appears to be the identity. When you make 
all the implicit arguments explicit, however, you will see that it sends the zero 
constructor in Fin m to the zero constructor of type Fin (m + n). Hence, the inject 
function maps Fin m into the first m elements of the type Fin (m + n). Dually, 
the raise function maps Fin n into the last n elements of the type Fin (m + n) by 
repeatedly applying the sue constructor. 

We can use inject and raise to define similar functions that work on our Rule 
and PsTerm data types, by mapping them over all the variables that they contain. 


Constructing the search tree 


Our proof search procedure is consists of two steps. First, we coinductively con- 
struct a (potentially infinite) search space; next, we will perform a bounded 
depth-first traversal of this space to find a proof of our goal. 

We will represent the search space as a (potentially) infinitely deep, but 
finitely branching rose tree. 



data SearchTree (A : Set) : Set where 
leaf : A — > SearchTree A 
node : List (oo (SearchTree A)) — » SearchTree A 

We will instantiate the type parameter A with a type representing proof terms. 
These terms consist of applications of rules, with a sub-proof for every premise. 

data Proof : Set where 

con : (name : RuleName) (args : List Proof) — > Proof 

Unfortunately, during the proof search we will have to work with partially com- 
plete proof terms. 

Such partial completed proofs are represented by the PartialProof type. In 
contrast to the Proof data type, the PartialProof type may contain variables, 
hence the type takes an additional number as its argument: 

PartialProof : N — > Set 

PartialProof m = 3 [k] Vec (PsTerm m) k x (Vec Proof k — > Proof) 

A value of type PartialProof m records three separate pieces of information: 

— a number k, representing the number of open subgoals; 

— a vector of length k, recording the subgoals that are still open; 

— a function that, given a vector of k proofs for each of the subgoals, will 
produce a complete proof of the original goal. 

Next, we define the following function to help construct partial proof terms: 

apply : (r : Rule n) — > Vec Proof (arity r + k) — > Vec Proof (sue k) 
apply r xs = new :: rest 

where 

new = con (name r) (toList (take (arity r) xs)) 
rest = drop (arity r) xs 

Given a Rule and a list of proofs of subgoals, this apply function takes the required 
sub-proofs from the vector, and creates a new proof by applying the argument 
rule to these sub-proofs. The result then consists of this new proof, together with 
any unused sub-proofs. This is essentially the ‘unflattening’ of a rose tree. 

We can now finally return to our proof search algorithm. The solveAcc func- 
tion forms the heart of the search procedure. Given a hint database and the 
current partially complete proof, it produces a SearchTree containing completed 
proofs. 

solveAcc : HintDB — » PartialProof (6 + m) -4 SearchTree Proof 

solveAcc rules (0 , [],p) = leaf (p []) 

solveAcc rules (sue k, g :: gs, p) = node (map step rules) 

If there are no remaining subgoals, i.e., the list in the second component of the 
PartialProof is empty, the search is finished. We construct a proof p [], and wrap 



this in the leaf constructor of the SearchTree. If we still have open subgoals, we 
have more work to do. In that case, we will try to apply every rule in our hint 
database to resolve this open goal — our rose tree has as many branches as there 
are hints in the hint database. The real work is done by the step function, locally 
defined in a where clause, that given the rule to apply, computes the remainder 
of the SearchTree. 

Before giving the definition of the step function, we will try to provide some 
intuition. Given a rule, the step function will try to unify its conclusion with 
the current subgoal g. When this succeeds, the premises of the rule are added 
to the list of open subgoals. When this fails, we return a node with no children, 
indicating that applying this rule can never prove our current goal. 

Carefully dealing with variables, however, introduces some complication, as 
the code for the step function illustrates: 

step : 3 [5] (Rule 8 ) —> oo (SearchTree Proof) 
step ( 6 , r) 

with unify (inject 8 g) (raise m (conclusion r) ) 

... | nothing = jj node [] 

... | just(n,mgu) = jj solveAcc prf 

where 

prf : Partial Proof n 

prf = arity r + k, gs', (p o apply r) 

where 

gs' : Vec (Goal n) (arity r + k) 

gs' = map (sub mgu) (raise m (fromList (premises r)) -H- inject 8 gs) 

Note that we use the function sub to apply a substitution to a term. This function 
is defined by McBride [2003]. 

The rule given to the step function may have a number of free variables of 
its own. As a result, all goals have to be injected into a larger domain which 
includes all current variables and the new rule’s variables. The rule’s premises 
and conclusion are then also raised into this larger domain, to guarantee freshness 
of the rule variables. 

The definition of the step function attempts to unify the current subgoal g 
and conclusion of the rule r. If this fails, we can return node [] immediately. If 
this succeeds, however, we build up a new partial proof, prf. This new partial 
proof, once again, consists of three parts: 

— the number of open subgoals is incremented by arity r, i.e., the number of 
premises of the rule r. 

— the vector of open subgoals gs is extended with the premises of r, after 
weakening the variables of appropriately. 

— the function producing the final Proof object will, given the proofs of the 
premises of r, call apply r to create the desired con node in the final proof 
object. 

The only remaining step, is to kick-off our proof search algorithm with a 
partial proof, consisting of a single goal. 



solve : (goal : PsTerm m) — > HintDB — ► SearchTree Proof 
solve g rules = solveAcc (l,g :: [ ] , head) 


Searching for proofs 

After all this construction, we are left with a simple tree structure, which we can 
traverse in search of solutions. For instance, we can define a bounded depth- first 
traversal. 

dfs : (depth : N) — > SearchTree A — > List A 

dfs zero _ = [ 

dfs (sue k) (leaf x) = return x 

dfs (sue k) (node xs) = concatMap (Ax-> dfs k (b x)) xs 

It is fairly straightforward to define other traversal strategies, such as a breadth- 
first search. Similarly, we could define a function which traverses the search tree 
aided by some heuristic. We will explore further variations on search strategies 
in Section 5. 


4 Adding reflection 

To complete the definition of our auto function, we still need to convert between 
Agda’s built-in AgTerm data type and the data type required by our unification 
and resolution algorithms, PsTerm. Similarly, we will need to transform the Proof 
produced by our solve function to an AgTerm that can be unquoted. These are 
essential pieces of plumbing, necessary to provide the desired proof automation. 
While not conceptually difficult, this does expose some of the limitations and de- 
sign choices of the auto function. If you are unfamiliar with the precise workings 
of the Agda reflection mechanism, you may want to skim this section. 

The first thing we will need are concrete definitions for the TermName and 
RuleName data types, which were parameters to the development presented in 
the previous section. It would be desirable to identify both types with Agda’s 
Name type, but unfortunately Agda does not assign a name to the function space 
type operator, nor does Agda assign names to locally bound variables. 

To address this, we define two new data types TermName and RuleName. 

First, we define the TermName data type. 

data TermName : Set where 
name : Name — > TermName 
pvar : N — ► TermName 
impl : TermName 

The TermName data type has three constructors. The name constructor embeds 
Agda’s built-in Name in the TermName type. The pvar constructor describes 
locally bound variables, represented by their De Bruijn index. Note that the 



pvar constructor has nothing to do with PsTerrrVs var constructor: it is not used 
to construct a Prolog variable, but rather to be able to refer to a local variable 
as a Prolog constant. Finally, impl explicitly represents the Agda function space. 
We define the RuleName type in a similar fashion. 

data RuleName : Set where 
name : Name — > RuleName 
rvar : N — > RuleName 

The rvar constructor is used to refer to Agda variables as rules. Its argument i 
corresponds to the variable’s De Bruijn index - the value of i can be used directly 
as an argument to the var constructor of Agda’s Term data type. 

As we have seen in Section 2, the auto function may fail to find the desired 
proof. Furthermore, the conversion from AgTerm to PsTerm may also fail for 
various reasons. To handle such errors, we will work in the Error monad defined 
below: 

Error : (A : Set a) — ► Set a 
Error A = Message l±l A 

Upon failure, the auto function will produce an error message. The corresponding 
Message type simply enumerates the possible sources of failure: 

data Message : Set where 

searchSpaceExhausted : Message 
unsupportedSyntax : Message 

The meaning of each of these error messages will be explained as we encounter 
them in our implementation below. 

Finally, we will need one more auxiliary function to manipulate bound vari- 
ables. The match function takes two bound variables of types Fin m and Fin n 
and computes the corresponding variables in Fin (m U n) - where m U n denotes 
the maximum of m and n: 

match : Fin m — > Fin n — ► Fin (m U n) x Fin (m U n) 

The implementation is reasonably straightforward. We compare the numbers n 
and m, and use the inject function to weaken the appropriate bound variable. It 
is straightforward to use this match function to define similar operations on two 
terms or a term and a list of terms. 


Constructing terms 

We now turn our attention to the conversion of an AgTerm to a PsTerm. There 
are two problems that we must address. 

First of all, the AgTerm type represents all (possibly higher-order) terms, 
whereas the PsTerm type is necessarily first-order. We mitigate this problem by 



allowing the conversion to ‘fail’, by producing a term of the type Exception, as 
we saw in the introduction. 

Secondly, the AgTerm data type uses natural numbers to represent variables. 
The PsTerm data type, on the other hand, represents variables using a finite type 
Fin n, for some n. To convert between these representations, the function keeps 
track of the current depth, i.e. the number of ri-types it has encountered, and 
uses this information to ensure a correct conversion. We sketch the definition of 
the main function below: 

convert : (binders : N) — » AgTerm — » Error (3 PsTerm) 
convert b (var i []) = inj2 (convertVar b i) 

convert b (con n args) = convertName n o convert b ($) args 
convert b (def n args) = convertName n o convert b ($) args 
convert b (pi (arg (arg-info visible _) (el _ ti)) (el _ t2)) 
with convert b ti | convert (sue b) t2 
... | inji msg | _ = inji msg 

... | _ | inji msg = inji msg 

... j inj 2 (ni,pi) j inj 2 (n 2 ,p 2 ) 

with match pi p 2 

■■■ I (Pi / ,P 2 / ) = inj 2 (ni U n 2 ,con impl (pi' :: p 2 ' :: [])) 

convert b (pi (arg ) (el _t 2 )) = convert (sue b) t 2 

convert b _ = inji unsupportedSyntax 

We define special functions, convertVar and name2term, to convert variables and 
constructors or defined terms respectively. The arguments to constructors or 
defined terms are processed using the convertChildren function defined below. 
The conversion of a pi node binding an explicit argument proceeds by converting 
the domain and then codomain. If both conversions succeed, the resulting terms 
are matched and a PsTerm is constructed using impl. Implicit arguments and 
instance arguments are ignored by this conversion function. Sorts, levels, or any 
other Agda feature mapped to the constructor unknown of type Term triggers a 
failure with the message unsupportedSyntax. 

The convertChildren function converts a list of Term arguments to a list of Pro- 
log terms, by stripping the arg constructor and recursively applying the convert 
function. We only give its type signature here, as the definition is straightfor- 
ward: 

convertChildren : N — > List (Arg Term) — > Error (3 (List o PsTerm)) 

To convert between an AgTerm and PsTerm we simply call the convert func- 
tion, initializing the number of binders encountered to 0. 

agda2term : AgTerm — ► Error (3 PsTerm) 
agda2term t = convert 0 t 



Constructing rules 


Our next goal is to construct rules. More specifically, we need to convert a list 
of quoted Names to a hint database of Prolog rules. To return to our example in 
Section 2, the definition of even+ had the following type: 

even+ : Even n — t Even m — > Even (n + m) 

We would like to construct a value of type Rule that expresses how even+ can 
be used. In Prolog, we might formulate the lemma above as the rule: 

even(add(M,N) ) :- even(M) , even(N) . 

In our Agda implementation, we can define such a rule manually: 

Even+ : Rule 2 
Even+ = record { 

name = name even+ 

conclusion = con (name (quote Even)) ( 

con (name (quote _+_)) (var (# 0) :: var (# 1) :: []) 

::[] 

) 

premises = con (name (quote Even)) (var (# 0) :: []) 

:: con (name (quote Even)) (var (# 1) :: []) 

::[] 

} 

In the coming subsection, we will show how to generate the above definition from 
the Name representing even+. 

This generation of rules is done in two steps. First, we will convert a Name 
to its corresponding PsTerm: 

name2term : Name — t Error (3 PsTerm) 
name2term = agda2term o unel o type 

The type construct maps a Name to the AgTerm representing its type; the unel 
function discards any information about sorts; the agda2term was defined previ- 
ously. 

In the next step, we process this PsTerm. The split function, defined below, 
splits a PsTerm at every top- most occurrence of the function symbol impl. Note 
that it would be possible to define this function directly on the AgTerm data 
type, but defining it on the PsTerm data type is much cleaner as we may assume 
that any unsupported syntax has already been removed. 

split : PsTerm n — t 3 (A k — > Vec (PsTerm n) (sue k)) 

split (con impl (ti :: t 2 :: [])) = Product. map sue (_::_ ti) (split t 2 ) 

splitt = (0,t :: []) 

Using all these auxiliary functions, we now define the name2rule function 
below that constructs a Rule from an Agda Name. 



name2rule : Name — » Error (3 Rule) 
name2rule nm with name2term nm 
... | inji msg = inji msg 

... | inj 2 (n, t) with split t 

... | (k, ts) with initLast ts 

... | (prems, concl, _) = inj 2 (n, rule (name nm) concl (toList prems)) 

We convert a name to its corresponding PsTerm, which is converted to a vector 
of terms using split. The last element of this vector is the conclusion of the rule; 
the prefix constitutes the premises. We use the initLast function from the Agda 
standard library, to decompose this vector accordingly. 

Constructing goals 

Next, we turn our attention to converting a goal AgTerm to a PsTerm. While 
we could use the agda2term function to do so, there are good reasons to explore 
other alternatives. 

Consider the example given in Section 2. The goal AgTerm we wish to prove 
is Even n — ► Even (n + 2). Calling agda2term would convert this to a PsTerm, 
where the function space has been replaced by the constructor impl. Instead, 
however, we would like to introduce arguments, such as Even n, as assumptions 
to our hint database. 

In addition, we cannot directly reuse the implementation of convert that was 
used in the construction of terms. The convert function maps every AgTerm 
variable is mapped to a Prolog variable that may still be instantiated. When 
considering the goal type, however, we want to generate skolem constants for 
our variables. To account for this difference we have two flavours of the convert 
function: convert and convert4Goal. Both differ only in their implementation of 
convertVar. 

agda2goalx premises : AgTerm — > Error (3 PsTerm x HintDB) 

agda2goalx premises t with convert4Goal 0 t 

... | inji msg = inji msg 

... | inj 2 (n,p) with split p 

... | (k, ts) with initLast ts 

... | (prems, goal, _) = inj 2 ((n, goal), toPremises k prems) 

Fortunately, we can reuse many of the other functions we have defined above, 
and, using the split and initLast functions, we can get our hands on the list 
of premises prems and the desired return type goal. The only missing piece of 
the puzzle is a function, toPremises, which converts a list of PsTerms to a hint 
database containing rules for the arguments of our goal. 

toPremises : V { k } — > N — » Vec (PsTerm n) k — > HintDB 
toPremises i [] = [ j 

toPremises i (t :: ts) = (n,rule (rvar i) t []) :: toPremises (sue i) ts 

The toPremises converts every PsTerm in its argument list to a rule, using the 
argument’s De Bruijn index as its rule name. 



Reification of proof terms 

Now that we can compute Prolog terms, goals and rules from an Agda Term, 
we are ready to call the resolution mechanism described in Section 3. The only 
remaining problem is to convert the witness computed by our proof search back 
to an AgTerm, which can be unquoted to produce the desired proof. This is 
done by the reify function that traverses its argument Proof; the only interesting 
question is how it handles the variables and names it encounters. 

The Proof may contain two kinds of variables: locally bound variables, rvar i, 
or variables storing an Agda Name, name n. Each of these variables is treated 
differently in the reify function. 

reify : Proof — > AgTerm 

reify (con (rvar i) ps) = var i [] 

reify (con (name n) ps) with definition n 

... | function x = def n (toArg o reify ($) ps) 

... | constructor' = con n (toArg o reify ($) ps) 

... | _ = unknown 

where 

toArg : AgTerm — > Arg AgTerm 
toArg = arg (arg-info visible relevant) 

Any references to locally bound variables are mapped to the var constructor of 
the AgTerm data type. These variables correspond to usage of arguments to the 
function being defined. As we know by construction that these arguments are 
mapped to rules without premises, the corresponding Agda variables do not need 
any further arguments. 

If, on the other hand, the rule being applied is constructed using a name, 
we do disambiguate whether the rule name refers to a function or a construc- 
tor. The definition function, defined in Agda’s reflection library, tells you how 
a name was defined (i.e. as a function name, constructor, etc). For the sake of 
brevity, we restrict the definition here to only handle defined functions and data 
constructors. It is easy enough to extend with further branches for postulates, 
primitives, and so forth. 

We will also need to wrap additional lambdas around the resulting term, due 
to the premises that were introduced by the agda2goalx premises function. To do 
so, we define the intros function that repeatedly wraps its argument term in a 
lambda. 

intros : AgTerm — > AgTerm 
intros = introsAcc (length args) 

where 

introsAcc : N — )■ AgTerm — > AgTerm 
introsAcc zero t = t 

introsAcc (sue k) t = lam visible (introsAcc k t) 



Hint databases 

Users to provide hints, i.e., rules that may be used during resolution, in the form 
of a hint database. These hint databases consist of an (existentially quantified) 
a list of rules. We can add new hints to an existing database using the insertion 
operator, «, defined as follows: 

_«_ : HintDB — > Name — > HintDB 
db « n with name2rule n 
db « n | inji msg = db 
db « n | inj 2 r = db -H- [r] 

If the generation of a rule fails for whatever reason, no error is raised, and the 
rule is simply ignored. Our actual implementation requires an implicit proof ar- 
gument that all the names in the argument list can be quoted successfully. If you 
define such proofs to compute the trivial unit record as evidence, Agda will fill 
them in automatically in every call to the _«_ function on constant arguments. 
This simple form of proof automation is pervasive in Agda programs [Oury and 
Swierstra, 2008, Swierstra, 2010]. 

This is the simplest possible form of hint database. In principle, there is no 
reason not to define alternative versions that assign priorities to certain rules 
or limit the number of times a rule may be applied. We will investigate some 
possibilities for extensible proof search in section 5. 

It is worth repeating that hint databases are first-class objects. We can com- 
bine hints databases, filter certain rules from a hint database, or manipulate 
them in any way we wish. 

Error messages 

Lastly, we need to decide how to report error messages. Since we are going 
to return an AgTerm, we need to transform the Message type we saw previously 
into an AgTerm. When unquoted, this term will cause a type error, reporting the 
reason for failure. To accomplish this, we introduce a dependent type, indexed 
by a Message: 

data Exception : Message — > Set where 
throw : (msg : Message) — > Exception msg 

The message passed as an argument to the throw constructor, will be recorded 
in the Exception’s type, as we intended. 

Next, we define a function to produce an AgTerm from a Message. We could 
construct such terms by hand, but it is easier to just use Agda’s quoteTerm 
construct: 

quoteError : Message — ► Term 
quoteError searchSpaceExhausted = 

quoteTerm (throw searchSpaceExhausted) 
quoteError unsupportedSyntax = 
quoteTerm (throw unsupportedSyntax) 



Putting it all together 


Finally, we can present the definition of the auto function used in the examples 
in Section 2: 

auto : N — > HintDB — » AgTerm — > AgTerm 
auto depth rules goalType 

with agda2goal x premises goalType 
... | inji msg = quoteError msg 
... | inj 2 ((n, g), args) 
with dfs depth (solve g (args -H- rules)) 

... | [] = quoteError searchSpaceExhausted 
... | (p :: _) = intros (reify p) 

The auto function takes an AgTerm representing the goal type, splits it into 
PsTerms representing the goal g and a list of arguments, args. These arguments 
are added to the initial hint database. Calling the solve function with this hint 
database and the goal g, constructs a proof tree, that we traverse up to the given 
depth in search of a solution. If this proof search succeeds, the Proof is converted 
to an AgTerm, a witness that the original goal is inhabited. There are two places 
where this function may fail: the conversion to a PsTerm may fail because of 
unsupported syntax; or the proof search may not find a result. 


5 Extensible proof search 


As we promised in the previous section, we will now explore several variations 
and extensions to the auto tactic described above. 


Custom search strategies 

The simplest change we can make is to abstract over the search strategy used by 
the auto function. In the interest of readability we will create a simple alias for 
the types of search strategies. A Strategy represents a function which searches a 
SearchTree up to depth, and returns a list of the leaves (or Proofs) found in the 
SearchTree in an order which is dependent on the search strategy. 

Strategy = (depth : N) — ► SearchTree A — > List A 

The changed type of the auto function now becomes. 

auto : Strategy — > N — > HintDB — > AgTerm — > AgTerm 

This will allow us to choose whether to pass in dfs, breadth-first search or even 
a custom user-provided search strategy. 



Custom hint databases 


In addition, we have developed a variant of the auto tactic described in the paper 
that allows users to define their own type of hint database, provided they can 
implement the following interface: 


HintDB : Set 

Hint : N — > Set 

getHints : HintDB — > Hints 

getRule : Hint k -4 Rule k 

getTr : Hint k -4 (HintDB -4 HintDB) 


Besides the obvious types for hints and rules, we allow hint databases to evolve 
during the proof search. The user-defined getTr function describes a transforma- 
tion that may modify the hint database after a certain hint has been applied. 

Using this interface, we can implement many variations on proof search. 
For instance, we could implement a ‘linear’ proof search function that removes 
a rule from the hint database after it has been applied. Alternatively, we may 
want to assign priorities to our hints. To illustrate one possible application of this 
interface, we will describe a hint database implementation that limits the usage 
of certain rules. Before we do so, however, we need to introduce a motivating 
example. 


Example: limited usage of hints 

We start by defining the following sublist relation, taken from the Agda tuto- 
rial [Norell, 2009a]: 

data _C_ : List A -4 List A -4 Set where 
stop : [] C [] 

drop : xs C ys 4 xs C y :: ys 
keep : xs C ys -4 x :: xs C x :: ys 

It is easy to show that the sublist relation is both reflexive and transitive — and 
using these simple proofs, we can build up a small hint database to search for 
proofs on the sublist relation. 

hintdb : HintDB 

hintdb = £ « quote drop « quote keep « quote C-refl « quote C-trans 

Our auto tactic quickly finds a proof for the following lemma: 

lemmai : ws C 1 :: xs 4 xs C ys 4 ys C zs 4 ws C 1 :: 2 :: zs 
lemma! = tactic (auto dfs 10 hintdb) 

The following lemma, however, is false. 

Iemma 2 : ws C 1 :: xs 4 xs C ys 4 ys C zs 4 ws C 2 :: zs 
Iemma 2 = tactic (auto dfs 10 hintdb) 



Indeed, this example does not type check and our tactic reports that the search 
space is exhausted. As noted by Chlipala [2013] when examining tactics in Coq, 
auto will nonetheless spend a considerable amount of time trying to construct 
a proof. As the trans rule is always applicable, the proof search will construct a 
search tree up to the full search depth— resulting in an exponental running time. 

We will use a variation of the auto tactic to address this problem. Upon 
constructing the new hint database, users may assign limits to the number of 
times certain hints may be used. By limiting the usage of transitivity, our tactic 
will fail more quickly. 

To begin with, we choose the representation of our hints: a pair of a rule and 
a ‘counter’ that records how often the rule may still be applied: 

record Hint (k : N) : Set where 
field 

rule : Rule k 
counter : Counter 

These counter values will either be a natural number n, representing that the rule 
can still be used at most n times; or T, when the usage of the rule is unrestricted. 

Counter : Set 
Counter = N l±) T 

Next, we define a decrementing function, decrCounter, that returns nothing when 
a rule can no longer be applied: 

decrCounter : Counter — > Maybe Counter 
decrCounter (inj x 0) = nothing 
decrCounter (inji 1) = nothing 
decrCounter (inji x) = just (inji (pred x)) 
decrCounter (injs tt) = just (inj 2 tt) 

Given a hint h, the transition function will now simply find the position of h 
in the hint database and decrement the hint’s counter, removing it from the 
database if necessary. 

We can redefine the default insertion function (_«_) to allow unrestricted 
usage of a rule. However, we will define a new insertion function which will allow 
the user to limit the usage of a rule during proof search: 

_« [_] _ : HintDB — > N — > Name — » HintDB 
db « [0] _ = db 
db « [x] n with (name2rule n) 
db « [x] n | inji msg = db 

db « [x] n | inj 2 (k, r) = db -H- [k, record { rule = r, counter = inji x}] 

We now revisit our original hint database and limit the number of times transi- 
tivity may be applied: 



hintdb : HintDB 
hintdb = £ « quote drop 

« quote keep 

« quote refl 

« [2] quote trans 

If we were to search for a proof of Iemma2 now, our proof search fails sooner. A 
fortiori, if we use this restricted database when searching for a proof of lemma!, 
the auto function succeeds sooner, as we have greatly reduced the search space. 
Of course, there are plenty of lemmas that require more than two applications of 
transitivity. The key insight, however, is that users now have control over these 
issues - something which is not even possible in current implementations of auto 
in Coq. 


6 Discussion 

The auto function presented here is far from perfect. This section not only dis- 
cusses its limitations, but compares it to existing proof automation techniques 
in interactive proof assistants. 

Restricted language fragment The auto function can only handle first-order 
terms. Even though higher-order unification is not decidable in general, we be- 
lieve that it should be possible to adapt our algorithm to work on second-order 
goals. Furthermore, there are plenty of Agda features that are not supported 
or ignored by our quotation functions, such as universe polymorphism, instance 
arguments, and primitive functions. 

Even for definitions that seem completely first-order, our auto function can 
fail unexpectedly. Consider the following definition of the pair type: 

_x_ : (A B : Set) — ► Set 

A x B = IA(A_->B) 

pair : {A B : Set} — >■ A — > B 4 A x B 

pairxy = x, y 

Here a (non-dependent) pair is defined as a special case of the dependent pair 
type Z. Now consider the following trivial lemma: 

andlntro : (A : Set) -4 (B : Set) -y A x B 

Somewhat surprisingly, trying to prove this lemma using our auto function, pro- 
viding the pair function as a hint, fails. The quoteGoal construct always re- 
turns the goal in normal form, which exposes the higher-order nature of A x B. 
Converting the goal (Ax (A _ — > B)) to a PsTerm will raise the ‘exception’ 
unsupportedSyntax; the goal type contains a lambda which causes the proof search 
to fail before it has even started. 



Refinement The auto function returns a complete proof term or fails entirely. 
This is not always desirable. We may want to return an incomplete proof, that 
still has open holes that the user must complete. The difficulty lies with the 
current implementation of Agda’s reflection mechanism, as it cannot generate 
an incomplete Term. 

In the future, it may be interesting to explore how to integrate proof au- 
tomation using the reflection mechanism better with Agda’s IDE. For instance, 
we could create an IDE feature which replaces a call to auto with the proof 
terms that it generates. As a result, reloading the file would no longer need to 
recompute the proof terms. 

Metatheory The auto function is necessarily untyped because the interface of 
Agda’s reflection mechanism is untyped. Defining a well-typed representation of 
dependent types in a dependency typed language remains an open problem, de- 
spite various efforts in this direction [Chapman, 2009, Danielsson, 2006, Devriese 
and Piessens, 2013, McBride, 2010]. If we had such a representation, however, we 
could use the type information to prove that when the auto function succeeds, 
the resulting term has the correct type. As it stands, a bug in our auto function 
could potentially produce an ill-typed proof term, that only causes a type error 
when that term is unquoted. 

Variables The astute reader will have noticed that the tactic we have imple- 
mented is closer to Coq’s eauto tactic than the auto tactic. The difference be- 
tween the two tactics lies in the treatment of unification variables: eauto may 
introduce new variables during unification; auto will never do so. It would be 
fairly straightforward to restrict our tactic to only apply hints when all variables 
known. A suitable instantiation algorithm, which we could use instead of the 
more general unification algorithm in this paper, has already been developed in 
previous work [Swierstra and van Noort, 2013]. 

Technical limitations The auto tactic relies on the unification algorithm and 
proof search mechanism we have implemented ourselves. These are all run at 
compile time, using the reflection mechanism to try and find a suitable proof 
term. It is very difficult to say anything meaningful about the performance of 
the auto tactic, as Agda currently has no mechanism for debugging or profiling 
programs run at compile time. We hope that further advancement of the Agda 
compiler and associated toolchain can help provide meaningful measurements of 
the performance of auto. Similarly, a better (static) debugger would be invaluable 
when trying to understand why a call to auto failed to produce the desired proof. 


Related work 

There are several other interactive proof assistants, dependency typed program- 
ming languages, and alternative forms of proof automation in Agda. In the re- 
mainder of this section, we will briefly compare the approach taken in this paper 
to these existing systems. 



Coq Coq has rich support for proof automation. The Ltac language and the 
many primitive, customizable tactics are extremely powerful [Chlipala, 2013]. 
Despite Coq’s success, it is still worthwhile to explore better methods for proof 
automation. Recent work on Mtac [Ziliani et al., 2013] shows how to add a typed 
language for proof automation on top of Ltac. Furthermore, Ltac itself is not 
designed to be a general purpose programming language. It can be difficult to 
abstract over certain patterns and debugging proof automation is not easy. The 
programmable proof automation, written using reflection, presented here may 
not be as mature as Coq’s Ltac language, but addresses these issues. 

More recently, Malecha et al. [2014] have designed a higher-order reflective 
programming language (MirrorCore) and an associated tactic language (Rtac). 
MirrorCore defines a unification algorithm - similar to the one we have imple- 
mented in this paper. Alternative implementations of several familiar Coq tac- 
tics, such as eauto and setoidrewrite, have been developed using Rtac. The au- 
thors have identified several similar advantages of ‘programming’ tactics, rather 
than using built-in primitives, that we mention in this paper, such as manipu- 
lating and assembling first-class hint databases. 

Idris The dependency typed programming language Idris also has a collection of 
tactics, inspired by some of the more simple Coq tactics, such as rewrite, intros, 
or exact. Each of these tactics is built-in and implemented as part of the Idris 
system. There is a small Haskell library for tactic writers to use that exposes 
common commands, such as unification, evaluation, or type checking. Further- 
more, there are library functions to help handle the construction of proof terms, 
generation of fresh names, and splitting sub-goals. This approach is reminiscent 
of the HOL family of theorem provers [Gordon and Melham, 1993] or Coq’s 
plug-in mechanism. An important drawback is that tactic writers need to write 
their tactics in a different language to the rest of their Idris code; furthermore, 
any changes to tactics requires a recompilation of the entire Idris system. 

Agsy Agda already has a built-in ‘auto’ tactic that outperforms the auto function 
we have defined here [Lindblad and Benke, 2004]. It is nicely integrated with the 
IDE and does not require the users to provide an explicit hint database. It is, 
however, implemented in Haskell and shipped as part of the Agda system. As 
a result, users have very few opportunities for customization: there is limited 
control over which hints may (or may not) be used; there is no way to assign 
priorities to certain hints; and there is a single fixed search strategy. In contrast 
to the proof search presented here, where we have much more fine grained control 
over all these issues. 


Conclusion 

The proof automation presented in this paper is not as mature as some of these 
alternative systems. Yet we strongly believe that this style of proof automation 
is worth pursuing further. 



The advantages of using reflection to program proof tactics should be clear: 
we do not need to learn a new programming language to write new tactics; 
we can use existing language technology to debug and test our tactics; and we 
can use all of Agda’s expressive power in the design and implementation of our 
tactics. If a particular problem domain requires a different search strategy, this 
can be implemented by writing a new traversal over a Search Tree. Hint databases 
are first-class values. There is never any built-in magic; there are no compiler 
primitives beyond Agda’s reflection mechanism. 

The central philosophy of Martin-Lof type theory is that the construction 
of programs and proofs is the same activity. Any external language for proof 
automation renounces this philosophy. This paper demonstrates that proof au- 
tomation is not inherently at odds with the philosophy of type theory. Paraphras- 
ing Martin-Lof [1985], it no longer seems possible to distinguish the discipline of 
programming from the construction of mathematics. 
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